Proteins: Structure, Function, and Bioinformatics
○ Wiley
Preprints posted in the last 90 days, ranked by how well they match Proteins: Structure, Function, and Bioinformatics's content profile, based on 82 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Spiliopoulou, M.; Schulz, E. C.
Show abstract
Glutamate racemase (MurI) catalyzes the stereochemical interconversion of L-glutamate to D-glutamate, a key element of bacterial peptidoglycan biosynthesis. In this study, we present the crystal structure of Helicobacter pylori glutamate racemase at 1.43 [A] and in monoclinic symmetry, as previously reported models, but different unit-cell parameters. The present model contains a single dimer and retains the previously described head-to-head dimer arrangement. The differences between the models arise from variations in unit-cell parameters, which lead to altered crystal packing interactions rather than changes in the quaternary assembly. The monomeric fold and active-site architecture remain conserved and are consistent with the catalytic features described for bacterial glutamate racemases. This structure provides an updated, high-resolution structural model for H. pylori glutamate racemase and highlights the variability in crystal packing within the same space group.
Makhatadze, G. I.
Show abstract
A variant of the U1A protein containing four substitutions to ionizable residues was generated serendipitously due to a miscommunication. Biophysical measurements show that this variant has at least twice as much helical structure as the wild-type U1A and is trimeric in solution, in contrast to the monomeric wild type. In sharp contrast, structures predicted by deep-learning AI tools (AlphaFold2 and RoseTTAFold2) and transformer-based tools (OmegaFold and ESMFold) are all highly similar to the wild-type U1A (backbone RMSD < 1 [A]). Even more surprising, two of the substituted ionizable residues are predicted to be fully buried in the non-polar core of the protein, an outcome that contradicts well-established physico-chemical principles, as ionizable residues are normally located on the protein surface. To explore this effect further, we generated sequences containing up to all twelve residues that make up the non-polar core of U1A. Across thousands of sequences, and depending on the AI model used, the majority of predicted structures contained fully buried ionizable residues while still maintaining the overall U1A fold. We then examined two additional proteins of comparable size, acylphosphatase and the de novo-designed TOP7 fold, and observed the same phenomenon: AI models frequently predicted structures with buried ionizable residues that nevertheless retained the parent fold. When these AI-predicted structures were subjected to short (50 ns) molecular dynamics simulations using physics-based force fields such as CHARMM or AMBER, the structures rapidly relaxed into ensembles that exposed ionizable residues. We conclude that while AI-based structure prediction tools perform extremely well on naturally occurring sequences, they do not reliably encode the physico-chemical principles governing the placement of ionizable residues. A straightforward remedy is to include a brief molecular dynamics simulation as a final validation step for AI-generated structures.
Maduros, A.; Farinsky, L.; Tagkopoulos, P.; Vater, A.; Siegel, J. B.
Show abstract
This study explores computational design predictions related to experimental enzyme behavior by analyzing seven single-point mutants of {beta}-glucosidase B (BglB) from Paenibacillus polymyxa: Y333F, A88E, L219Q, A408H, Y173L, E340S, and Y422F. Each mutation was modeled using Foldit Standalone, and mutant selections were based on predicted thermodynamic stability changes of interest. Six of the seven mutants in this set yielded soluble, expressed protein. Most variants had similar catalytic efficiency compared to the wild type with one exception. The melting temperatures for most variants were also similar to the wild type. Correlation analysis revealed weak but potentially informative relationships between predicted {Delta}TSE and (a) thermal stability and (b) catalytic efficiency. These results further support known limitations of TSE score as a tool for single point mutation design and add to a growing dataset being generated to build the next generation of functionally predictive protein models.
Habibullah, S.; Mondal, D.; Kumar, S.; Reddy, G.
Show abstract
Group I Introns are non-coding regions of pre-mRNA that catalyze their splicing from the RNA sequence by folding to a specific structure. We used computer simulations to study the folding mechanism of the P4-P6 domain in the Tetrahymena thermophila group I intron, focusing on the GAAA tetraloop-receptor (TL-R) interaction, which is a ubiquitous tertiary interaction in RNA structures. We show that the intron folds via a multistep pathway, populating seven states with distinct tertiary contacts. Under physiological Mg2+ concentrations ([Mg2+]), the loop-bulge-P4 tertiary interaction is essential to stabilize the docked TL-R complex, whereas in high [Mg2+], the TL-R complex is stable by itself. The solvated Mg2+ ions modulate the TL-R docking-undocking dynamics and stabilize non-native intermediate states. The condensation of Mg2+ in the major grooves of the TL and R helices is critical for them to attain specific stiffness essential for their facile docking. The results highlight the critical role of Mg2+ ions in facilitating TL-R interaction formation, which stabilizes long-range tertiary contacts in RNA structures. For Table of Contents Use Only O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/700762v1_ufig1.gif" ALT="Figure 1"> View larger version (26K): org.highwire.dtl.DTLVardef@142852org.highwire.dtl.DTLVardef@1632ad1org.highwire.dtl.DTLVardef@190021aorg.highwire.dtl.DTLVardef@17a1261_HPS_FORMAT_FIGEXP M_FIG C_FIG
Opdam, L.; Meneghello, M.; Guendon, C.; Chargelegue, J.; Fasano, A.; Jacq-Bailly, A.; Leger, C.; Fourmond, V.
Show abstract
CO dehydrogenases (CODH) are metalloenzymes that reversibly oxidize CO to CO2, at a buried NiFe4S4 active site. The substrates, CO and CO2, need therefore to be transported through the protein matrix to reach the active site. The most likely pathway for intra-protein diffusion is the hydrophobic channel identified in the crystal structures. Here, we use site-directed mutagenesis to study the highly conserved isoleucine 563 of Thermococcus sp. AM4 CODH2. Mutations at this position change the biochemical properties (KM for CO, product inhibition constant, catalytic bias...), and increase the resistance of the enzyme to the inhibitor O2, showing that isoleucine 563 indeed lines the gas channel. The I563F mutation decreases the bimolecular rate constant of inhibition by O2 15-fold, and increases the IC50 20-fold, which is the strongest improvement in O2 resistance reported so far. We show that the size of the introduced amino acids is less important than their flexibility - along with the size of the cavity formed near the active site in the channel. We also conclude that O2 access to the active site cannot be slowed down without also affecting CO diffusion. This tradeoff will have to be considered in further attempts to use site-directed mutagenesis to make CODHs more O2 tolerant.
Gizzio, J.; Faezov, B.; Xu, Q.; Dunbrack, R. L.
Show abstract
Humans have 437 catalytically competent protein kinase domains with the typical kinase fold, similar to the structure of Protein Kinase A (PKA). The active form of a kinase must satisfy requirements for binding ATP, magnesium, and substrate. From structural bioinformatics analysis of 248 crystal structures of 54 unique substrate-bound kinases, we derived structural criteria for the active form of typical protein kinases. We include well-known requirements on the DFG motif of the activation loop and the N-terminal domain salt bridge, but also on the positions of the N-terminal and C-terminal segments of the activation loop that must be placed appropriately to bind substrate. With these criteria, only 130 of the 437 human catalytic protein kinases (30%) are in the Protein Data Bank in their active form. Because the active forms of catalytic kinases are needed for understanding substrate specificity and the effects of mutations on catalytic activity in cancer and other diseases, we used AlphaFold2 to produce models of all 437 human protein kinases in the active form. This was accomplished with templates from the PDB that resemble substrate-bound structures, shallow multiple sequence alignments of orthologs and close paralogs of the query protein, and application of the active-kinase criteria to the output models. We selected models for each kinase based on intramolecular ipSAE scores of the activation loop residues of these models, demonstrating that the highest scoring models have the lowest or close to the lowest RMSD to 29 non-redundant substrate-bound structures in the PDB. A larger benchmark of 117 active kinase structures with solved activation loops in the PDB shows that 71% of the highest scoring AlphaFold2 models had backbone RMSD < 1.0 [A] to the benchmark structures and 92% were within 2.0 [A]. Models for all 437 catalytic kinases are available at https://dunbrack.fccc.edu/kincore/activemodels. We believe they may be useful for interpreting mutations leading to constitutive catalytic activity in cancer as well as for templates for modeling substrate and inhibitor binding for molecules which bind to the active state.
Nandi, P.; Kamal, I. M.; Chakrabarti, S.; Sengupta, S.
Show abstract
The process of DNA transcription leads to the generation of torsional stress, which must be resolved for smooth progression of the transcription machinery. In Saccharomyces cerevisiae, DNA topoisomerase I (Top1), a type IB topoisomerase, plays a critical role in relaxing supercoils and mitigating the topological strain associated with transcription. While several proteins from the transcription machinery have been reported to interact with yeast Top1, detailed characterization and functional relevance of these interactions have remained underexplored. This gap is partly due to the absence of a complete three-dimensional structure of the full-length enzyme, which hinders structure-based computational analyses of its interactome. In this study, we present a template-based model of full-length yeast Top1. Leveraging this model, we investigated its molecular interaction with Rpc82, a key subunit of RNA polymerase III enzyme, responsible for transcribing small non-coding RNAs such as tRNAs and 5S rRNA. Through molecular docking and molecular dynamics simulations, critical residues at the Top1-Rpc82 interface were identified that likely mediate their interaction. Our findings provide new insights into the structural basis of Top1s association with RNA polymerase III and its potential role in regulating Pol III-mediated transcription. The Top1 model developed here offers a valuable framework for future in silico studies aimed at elucidating the broader interactome and regulatory mechanisms of this essential enzyme.
Rodriguez, S.; Fournet, A.; Bartels, S.; Pajkos, M.; Clerc, I.; Carriere, L.; Thureau, A.; Montanier, C.; Dumon, C.; Allemand, F.; Cortes, J.; Bernado, P.
Show abstract
Multidomain proteins connected by flexible linkers populate conformational ensembles that are challenging to characterize using conventional structural biology methods. In domain-linker-domain (DLD) proteins, linker-mediated inter-domain relative positions and orientations are functionally relevant, yet their dynamical behavior in solution normally remain poorly described. Small-angle X-ray scattering (SAXS) provides ensemble-averaged structural information for such systems; however, coupling with computational modeling is required to accurately describe the dynamic behavior of this family of proteins in solution. Here, we present a systematic evaluation of five ensemble-generation strategies applied to a set of eighteen proteins sharing the same two globular domains, connected by naturally occurring linkers of varying length and composition. Modeling methods based on different underlying principles are compared by assessing their agreement to experimental SAXS data, showing a large disparity and systematic structural biases among them. Furthermore, for each approach, we examine the effect of refinement against SAXS restraints and assess its capacity to describe the experimental data, as well as the induced biases in global dimensions and inter-domain distance distributions. This analysis underlines the importance of the initial conformational pool for deriving experimentally compatible ensembles. Overall, this work provides a high-quality benchmark for SAXS-driven ensemble modeling of flexible, multidomain proteins and establishes a framework for the critical interpretation of solution scattering data in systems with pronounced conformational heterogeneity.
Ferdous, S.; Mamun, Y.; Annamalai, T.; Leng, F.; Chapagain, P. P.; Tse-Dinh, Y.-C.
Show abstract
Mycobacterium tuberculosis topoisomerase I (MtbTOP1) is essential for the viability of the causative agent of TB. There are still significant unanswered questions regarding the dynamic conformations during catalysis of relaxation of negatively supercoiled DNA by MtbTOP1. We aim to study the flexible hinge residues that control the dynamics of inter-domain rearrangements involved in the enzyme conformational changes that allow the opening-closing of the topoisomerase gate. We used the online server PACKMAN to predict possible hinges from the MtbTOP1 crystal structure. The predicted region "PRO506 to LEU526" at the border between domains D2 and D4 with a p-value <0.05 was then studied as a potential hinge. The highly conserved ARG516 from this region interacts with the DNA inside the protein toroidal cavity. This arginine maintains inter-domain interaction with GLU207 of D4 and ASP691 of D5 domains. After introducing alanine substitutions, we further studied the mutant topoisomerases in biochemical experiments. The results showed a significant loss in DNA relaxation activity without affecting DNA binding and cleavage after mutating GLU207 and ARG516, consistent with their role as hinge residues in domain rearrangements.
Powell, A.
Show abstract
A methodology for computationally unstructuring proteins is described and the results of its application to a variety of proteins analyzed and discussed. Some proteins prove more susceptible than others, and fold topology plays a part in this. Alpha helical structure is found to be generally somewhat robust, and, perhaps unsurprisingly, unstructuring often begins at exposed chain termini. Phosphofructokinase-1 and phosphofructokinase-2, which have similar sizes but different fold topologies, are found to differ markedly in their unstructuring behaviour.
Hopkins, M. S.; Terwilliger, T. C.; Afonine, P.; Ginn, H. M.; HOLTON, J. M.
Show abstract
We report the discovery of a new class of local minima that has severely limited the accuracy of macromolecular models. Termed density misfit barrier traps, these minima explain much of the poor fit between macromolecular models and experimental data relative to that of smaller molecules: not just high R factors, but distorted chemical geometry. We postulated that proteins exist as an ensemble of conformations that each have good geometry, but refinement algorithms have been unable to converge to them due to a tangling phenomenon arising from these traps. To demonstrate, a synthetic ground truth data set was generated, consisting of a 2-member ensemble with excellent geometry. A series of starting models, each trapped in increasingly difficult local minima, were prepared, a unified validation score defined, and an open Challenge issued. This Challenge inspired algorithms for escaping such traps, and new programs have been released that are expected to substantially improve the accuracy of macromolecular ensemble models. SynopsisA synthetic 2-member conformational ensemble of a small protein and corresponding electron density data was generated to demonstrate how topological local minima hinder simultaneous agreement with density data and chemical geometry restraints in conventional structure refinement.
Sophocleous, G.; Owen, D.; Mott, H. R.
Show abstract
The protein kinase C-related kinase (PKN) family of serine/threonine kinases consists of PKN1, PKN2 and PKN3, all of which are Rho family GTPase effectors. PKNs have three N-terminal Homology Region 1 (HR1) domains (HR1a, HR1b and HR1c), which form antiparallel coiled coils, which in two cases interact with Rho family GTPases, activating the kinase. The PKNs are implicated in several important cellular processes, including cytoskeletal regulation, cell adhesion, gene expression and cell cycle progression, and are also implicated in cancer. Here we have investigated the roles of the HR1 domains in PKN oligomerisation. We show that PKN1 HR1a is a dimer and that the HR1c domain drives further oligomerization. We have mapped the interactions between the HR1 domains and used an integrative approach to model HR1-containing PKN1 dimers. Biophysical analysis shows that RhoA forms a 1:2 complex with HR1a, resulting in a rearrangement of the HR1a dimer, an outcome supported by SAXS models. In contrast, Rac1 binds to monomeric HR1a, suggesting that this GTPase activates PKN1 via a different mechanism. These data provide structural insight into interactions between HR1 domains and the Rho family proteins and their potential consequences for PKN1 activation.
Bugrova, A.; Orekhov, P.; Gushchin, I.
Show abstract
Recently developed deep learning-based tools can effectively generate structural models of complexes of proteins and non-proteinaceous compounds. While some of their predictive capabilities are truly exciting, others remain to be thoroughly tested. Here, we probe whether the ligand input format (Chemical Component Dictionary, CCD, or Simplified Molecular Input Line Entry System, SMILES) and charge (which depends on protonation) will affect the results of the predictions by four popular algorithms: AlphaFold 3, Boltz-2, Chai-1, and Protenix-v1. We chose methylamine and acetic acid as two of the simplest titratable chemicals that are omnipresent in proteins as amino and carboxy moieties, and are consequently ubiquitous in the Protein Data Bank models that are most commonly used for training. Unexpectedly, we found that for both molecules, in many cases the input format affected the prediction results, and did it much stronger compared to protonation, whereas changes in the formally specified charge of the molecules did not lead to changes in binding expected from experiments. We conclude that (i) ensuring identical results irrespective of input formats and (ii) inclusion of protonation-related steps into training and prediction pipelines are the two available paths for improvement of protein-ligand structure prediction algorithms.
Do, Q. H.; Kim Cavdar, I.; Grozdanov, P.; Theriot, J. J.; Ramani, R.; Jansen, M.
Show abstract
Nicotinic acetylcholine receptors (nAChRs) belong to the pentameric ligand-gated ion channel superfamily (pLGICs). Among them, the neuronal homomeric 7 nAChR is highly permeable to calcium and plays critical roles in synaptic transmission, cell signaling, and inflammation modulation. The biogenesis of 7 nAChRs is enhanced by the chaperone proteins RIC-3 and NACHO. Previously, we reported a motif in the 5-HT3A receptor, another pLGIC, involved in RIC-3 modulation. Residues in this motif are conserved and also found within the L1-MX segment of the 7 nACh subunit. We therefore explored the regulatory roles of these conserved residues in the biogenesis of 7 nAChRs using multiple approaches, including heterologous expression in Xenopus laevis oocytes, mutagenesis, pull-down assays, cell-surface labeling, and two-electrode voltage-clamp (TEVC) recordings. We find that synthetic 7 L1-MX peptide interacts with both RIC-3 and NACHO. In particular, conserved residues W330, R332, and L336 in the L1-MX positively regulates the assembly of 7 oligomers and the biogenesis of 7nAChR. In presence of residues W330, R332, and L336, NACHO promotes an assembly of an 7 pentamer which is resistant to strong denaturing conditions. NACHO-promoted 7 pentamer is also resistant to Endo H enzyme. Sensitivity of the pentamer to moderate temperatures (37 {degrees}C, 45 {degrees}C, and 50 {degrees}C) suggests that NACHO stabilizes the pentamer via non-covalent interactions. In contrast, Ala replacements at these residues disrupt the biogenesis and abolish 7 current. NACHO and RIC-3 co-expression yields partial rescue of functional expression for some Ala replacement constructs. SUMMARYThis work identifies regulatory roles of conserved residues W330, R332, and L336 in the biogenesis of 7 nAChR. This discovery positions MX subdomain as a promising target for future drug development that can minimize adverse effects.
Luo, Y.; Chen, X.; Lin, X.; Liao, W.; Xiao, B.; Li, M.; Qiu, Z.; Wilson, T. J.; Miao, Z.; Wang, J.; Huang, L.; Lilley, D. M. J.
Show abstract
We have determined the molecular structure and investigated the catalytic mechanism of two new ribozymes of the Hepatitis delta virus family, found in the nematode Caenorhabditis briggsae and virus Ackermannviridae. Crystal structures of both conform to the double-pseudoknot architecture adopted by the viral HDV ribozyme. The C. briggsae ribozyme has been determined both pre- and post-cleavage. In the former both nucleotides flanking the scissile phosphate are observed, along with a metal ion, and cytosine 75 N3 bound to the O5 leaving group. The pH dependence of cleavage rate reveals a pKa of 6.6 and together with the inactivity of a C75U mutant provides evidence for its role as general acid. In contrast to other nucleolytic ribozymes that use catalytic metal ions, reaction rate does not depend on the pKa of the divalent metal ion. Limited adjustment of structure of the active center is consistent with direct bonding of the metal ion to the O2 and non-bridging O, suggesting that the ion acts as a Lewis acid to activate nucleophilic attack. This mechanism appears to be general for the HDV ribozyme class, and distinguishes it from the majority of nucleolytic ribozymes that use general base catalysis.
Mishra, P.; Chazin-Gray, A. M.; Lamon, G.; Kim, D. E.; Baker, D.; Traaseth, N. J.
Show abstract
Multidrug efflux pumps transport antibiotics across the cellular membrane resulting in resistance conferred to the host organism. Efflux pump inhibitors (EPIs) potentiate the efficacy of antibiotics by blocking drug efflux and hold promise as adjuvant therapeutics in the fight against multidrug resistant pathogenic bacteria. A hurdle in the field has been the lack of selectivity of small molecule EPIs which often display off-target toxicity due to non-specific binding. To tackle this specificity challenge, we aimed to maximize an inhibitors binding surface area to efflux pumps by designing miniprotein EPIs using computational protein design and an E. coli co-expression assay to screen inhibition in cells. We used S. aureus NorA as a model efflux transporter since it confers drug resistance to fluoroquinolones, puromycin, and other cytotoxic compounds. Starting from a focused miniprotein library of only 86 members, we identified inhibitors in the screen that blocked NorA transport under active efflux conditions in vitro. Our most promising inhibitor I-23 was validated by solving a cryo-EM structure of the miniprotein in complex with NorA, which stabilized the transporter in the outward-open conformation. I-23 has a ferredoxin-like fold with one of its {beta}-hairpins inserted into the substrate binding pocket of NorA and other parts of the globular fold occupying the shallow pocket and making extensive intermolecular contacts with NorA. An arginine residue on the tip of the hairpin loop was positioned near an anionic patch required for NorA antibiotic efflux. The identified structural motifs in this work could be employed to explore the molecular properties of peptidoglycan penetration; full realization of the therapeutic potential of the designed miniprotein inhibitors will require determining the principles for facilitating passage of [~]7 to 8 kDa miniproteins across the peptidoglycan bacterial cell wall.
Shriver, T.; Berndt, S.; Robson, S. A.; Dixon, A. D.; Liebscher, I.; Ziarek, J. J.
Show abstract
Several members of the adhesion subfamily of G protein-coupled receptors (aGPCRs) are capable of self-activation by an internal agonist sequence (aka the Stachel) thats exposed upon removal or conformational changes of the N-terminal fragment of the receptor. Synthetic peptides derived from the Stachel sequence can be used as exogenous agonists. In the inactive form, the Stachel is sequestered as the {beta}13-strand within the GPCR Autoproteolysis-INducing (GAIN) domain, but it engages the seven transmembrane region as a helix when it is either an intramolecular sequence or a synthetic peptide. Little is known about the molecular details underlying this transition, but we hypothesize that a disordered conformation is central to this intermediate state in receptor activation. Despite the primarily helical Stachel AlphaFold3 and Pepfold4 models predicted with high confidence for the entire aGPCR subfamily, disorder predictions and biophysical experiments reveal a predominantly disordered conformation in solution. Investigating the ADGRG6/GPR126 Stachel peptide, circular dichroism (CD) and nuclear magnetic resonance (NMR) experiments reveal a predominantly random coil conformation in aqueous buffer, polar detergent micelles, and zwitterionic lipids. Titration of trifluoroethanol uncovered a two-state equilibrium between an unfolded and helix-containing conformation with NMR localizing a single-turn helix to residues L846-L849. Taken together, these data indicate the ADGRG6/GPR126 Stachel peptide is primarily disordered, but small populations may adopt a helix-containing conformer that seems to support a conformational-selection activation mechanism.
Broster, J. H.; Popovic, B.; Kondinskaia, D.; Deane, C. M.; Imrie, F.
Show abstract
Molecular docking aims to predict the binding conformation of a small molecule to its protein target. Recent work has proposed diffusion models for this task, from rigid-body docking that diffuses over ligand degrees of freedom to co-folding approaches that jointly generate protein structure and ligand pose. However, diffusion-based docking models have been shown to frequently produce physically implausible poses and fail to consistently recover key protein-ligand interactions. To address this, we introduce a reinforcement learning framework for training diffusion-based docking models directly on non-differentiable objectives. Fine-tuning DiffDock-Pocket for physical validity with our approach substantially increases the number of generated poses that are physically valid and interaction-preserving, with no increase in inference-time compute. Importantly, this comes without sacrificing structural accuracy; in fact, our approach increases the proportion of structures with near-native poses. These effects are most pronounced for protein targets that are dissimilar to the training data. Our fine-tuned DiffDock-Pocket model outperforms both classical docking algorithms and machine learning-based approaches on the PoseBusters set. Our results demonstrate that reinforcement learning can teach diffusion-based docking models to better respect physical constraints and recover key interactions, without the requirement to rely on inference-time corrections.
Guy, H. R.; Durell, S. R.; Shafrir, Y.
Show abstract
Soluble oligomers and transmembrane channels formed by the 42-residue variant of amyloid beta (A{beta}42) play key roles in Alzheimers disease. Unfortunately, detailed structures of these assemblies have not been determined. Our group addresses this problem by developing atomic scale models. Previously we proposed that both soluble A{beta}42 oligomers and transmembrane channels have symmetric concentric {beta}-barrel structures. Here we expand this hypothesis to include GM1 gangliosides and sometimes cholesterol and lattice models of channel assemblies. The presence of GM1 gangliosides increases the toxicity of A{beta}42, enhances its ability to penetrate liposome membranes, and facilitates interactions between adjacent liposomes. Although the conformations of numerous model assemblies vary, in these models the carboxyl group of GM1 always binds to side-chains of histidine 13 and/or histidine 14. Our soluble oligomer models are consistent with electron microscopy images of beaded annular protofibrils. Our models of membrane-bound assemblies are consistent with the following: freeze-fracture and atomic force microscopy images of A{beta}42 in lipid bilayers, secondary structure results, the calcium hypothesis of Alzheimers Disease, effects of lithium depletion on AD, established {beta}-barrel theory, and energetic criteria.
Pubal, K.; Kushnir, K.; Spiwok, V.; Louzecka, K.; Setnicka, V.; Lipovova, P.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWProteins are built from 20 canonical amino acids. It is interesting to explore whether proteins can be formed from significantly reduced amino acid alphabets. Our bioinformatics survey of UniProt (more than 250 M sequences) revealed that proteins composed of reduced amino acid alphabets (< 10) are extremely rare among existing proteins. Next, we used computational protein design to design proteins composed of all 1,013 possible alphabets of 2-10 early amino acids (Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val). The length of all proteins was 100 amino acid residues. Small amino acid alphabets preferred simple helices or helix bundles. Larger amino acid alphabets allowed for the design of more complex structures. A protein composed of 8 amino acids (Ala, Asp, Gly, Leu, Val, Ser, Thr, and Pro) was successfully experimentally verified. It belongs to fibronectin type III domain {beta}-sheet-rich architecture. Attempts to experimentally verify designs composed of 6 and 4 amino acids were unsuccessful. We show by a computational experiment without an experimental validation that inverse folding programs, namely ProteinMPNN, can stabilize designed proteins within the same amino acid alphabet. Our results show that globular proteins may have formed early in evolution. Furthermore, we show that it is possible to design proteins with interesting properties for biotechnology and synthetic biology.